Please wait a minute...
Data Analysis and Knowledge Discovery  2017, Vol. 1 Issue (12): 92-100    DOI: 10.11925/infotech.2096-3467.2017.0955
Orginal Article Current Issue | Archive | Adv Search |
Self-Explainable Reduction Method for Mixed Feature Data Modeling
Jiang Siwei1,2, Xie Zhenping1,2(), Chen Meijie1,2, Cai Ming3
1School of Digital Media, Jiangnan University, Wuxi 214122, China
2Jiangsu Key Laboratory of Media Design and Software Technology, Wuxi 214122, China
3Center of Informatization Development and Management, Jiangnan University, Wuxi 214122, China
Download: PDF (972 KB)   HTML ( 3
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper aims to mine the data with continuous numeric and label features. [Methods] We proposed a self-explainable reduction model to represent the data. The proposed model used the new reduction objective to create adaptive discrete division for continuous data dimension. [Results] We examined the new model with standard datasets and found it had better performance than the existing ones. [Limitations] The computational efficiency of the proposed method was not very impressive, which cannot meet the demand of large-scale data mining. [Conclusions] The proposed model is innovative and practical to model the mixed feature data.

Key wordsMixed Feature Data      Self-Explainable Reduction      Data Modeling      Data Mining     
Received: 22 September 2017      Published: 29 December 2017
ZTFLH:  TP393  

Cite this article:

Jiang Siwei,Xie Zhenping,Chen Meijie,Cai Ming. Self-Explainable Reduction Method for Mixed Feature Data Modeling. Data Analysis and Knowledge Discovery, 2017, 1(12): 92-100.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2017.0955     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2017/V1/I12/92

数据集名称 连续量
属性数
标签量
属性数
数据类数 实例数
glass 9 0 9 214
wine-quality white 11 0 11 4 897
wine-quality red 11 0 11 1 599
dermatology 1 33 34 366
ionosphere 32 2 34 351
adult 5 9 14 32 562
数据集 Na?veBayes 本文方法+Na?veBayes K-means+Na?veBayes FCM+Na?veBayes
glass 49.53 % 59.14±4.60% 61.92±4.12% 62.43±3.75%
wine-quality white 61.55% 67.97±3.31% 66.14±2.99% 65.87±3.32%
wine-quality red 55.35% 54.00±2.03% 58.16±0.98% 58.38±0.95%
dermatology 96.99% 96.94±0.11% 96.83±0.18% 96.78±0.11%
ionosphere 82.62 % 86.85±1.70% 84.63±1.70% 84.47±1.63%
adult 88.69% 92.47±0.82% 89.78±0.63 % 91.34±0.80%
规则集 规则复杂度值
1 9.5996±0.0074
2 16.2276±0.0158
3 20.8350±0.0140
4 29.9389±0.0211
5 29.6517±0.0445
设定规
则序号
本文算法结果
(交叉熵)
理想结果
(交叉熵)
相差度$\gamma $
1 0.2184±0.0710 0.1774±1.1532e-04 0.2309±0.4006
2 0.3947±0.0880 0.2640±2.1624e-04 0.4996±0.3260
3 0.2840±0.0743 0.2689±1.6490e-04 0.2617±0.1062
4 0.3309±0.0514 0.3554±1.9549e-04 0.1528±0.0940
5 0.2871±0.0325 0.3542±3.2526e-04 0.1929±0.0845
语义化特征量 标签L1 标签L2 标签L3 标签Le (无记录)
F1 67.77% 31.83% 0.40% /
F2 41.97% 56.63% / 1.40%
F3 46.69% 43.57% 9.74% /
F4 29.52% 42.37% 28.11% /
F5 46.29% 50.90% / 2.81%
F6 20.98% 46.69% 32.33% /
F7 42.87% 43.88% 13.25% /
F8 47.79% 35.54% / 16.67%
F10 69.48% 30.52% / /
F11 43.07% 25.30% / 31.63%
F12 82.03% 17.97% / /
F13 85.14% 14.86% / /
[1] Agrawal R, Imieliński T, Swami A.Mining Association Rules Between Sets of Items in Large Databases[C]// Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data. ACM, 1993: 207-216.
[2] Hsu C N, Huang H J, Wong T T.Why Discretization Works for Naive Bayesian Classifiers[C]// Proceedings of the 17th International Conference on Machine Learning. 2000: 399-406.
[3] García S, Luengo J, Sáez J A, et al. A Survey of Discretization Techniques: Taxonomy and Empirical Analysis in Supervised Learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(4): 734-750.
doi: 10.1109/TKDE.2012.35
[4] Mahanta P, Ahmed H A, Kalita J K, et al.Discretization in Gene Expression Data Analysis: A Selected Survey[C]// Proceedings of the 2nd International Conference on Computational Science, Engineering and Information Technology. 2011: 69-75.
[5] Pearl J.Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference[J]. Computer Science Artificial Intelligence, 1988, 70(2): 1022-1027.
[6] Waugh N T, Muir D D.Improving the Life Cycle Management of Power Transformers Transforming Data to Life[C]//Proceedings of the 2015 SoutheastCon. IEEE, 2015: 1-7.
[7] Altaf W, Shahbaz M, Guergachi A.Applications of Association Rule Mining in Health Informatics: A Survey[J]. Artificial Intelligence Review, 2017, 47(3): 313-340.
doi: 10.1007/s10462-016-9483-9
[8] 阮光册, 夏磊. 基于关联规则的文本主题深度挖掘应用研究[J]. 现代图书情报技术, 2016(12): 50-56.
[8] (Ruan Gangce, Xia Lei.Mining Document Topics Based on Association Rules[J]. New Technology of Library and Information Service, 2016(12): 50-56.)
[9] 路永和, 曹利朝. 基于关联规则综合评价的图书推荐模型[J]. 现代图书情报技术, 2011(2): 81-86.
[9] (Lu Yonghe, Cao Lichao.Books Recommended Model Based on Association Rules Comprehensive Evaluation[J]. New Technology of Library and Information Service, 2011(2): 81-86.)
[10] Agrawal B R, Srikant R.A Fast Algorithm for Mining Association Rules[C]//Proceedings of the 20th International Conference on Very Large Data Bases. 1994: 21-30.
[11] Han J, Pei J, Yin Y.Mining Frequent Patterns Without Candidate Generation[J]. ACM SIGMOD Record, 2009, 29(2): 1-12.
[12] Zaki M J.Scalable Algorithms for Association Mining[J]. IEEE Transactions on Knowledge and Data Engineering, 2000, 12(3): 372-390.
doi: 10.1109/69.846291
[13] Qian G, Rao C R, Sun X, et al.Boosting Association Rule Mining in Large Datasets via Gibbs Sampling[J]. Proceedings of the National Academy of Sciences of the United States of America, 2016, 113(18): 4958-4963.
doi: 10.1073/pnas.1604553113 pmid: 27091963
[14] Sheng G, Hou H, Jiang X, et al.A Novel Association Rule Mining Method of Big Data for Power Transformers State Parameters Based on Probabilistic Graph Model[J]. IEEE Transactions on Smart Grid, 2016(99): 1.
doi: 10.1109/TSG.2016.2562123
[15] Li J, Le T D, Liu L, et al. From Observational Studies to Causal Rule Mining[J]. ACM Transactions on Intelligent Systems and Technology, 2016, 7(2): Article No. 14.
doi: 10.1145/2746410
[16] Song K, Lee K.Predictability-based Collective Class Association Rule Mining[J]. Expert Systems with Applications, 2017, 79: 1-7.
doi: 10.1016/j.eswa.2017.02.024
[17] Agbehadji I E, Fong S, Millham R.Wolf Search Algorithm for Numeric Association Rule Mining[C]//Proceedings of the 2016 IEEE International Conference on Cloud Computing and Big Data Analysis. IEEE, 2016: 146-151.
[18] Jorge A M, Azevedo P J.Optimal Leverage Association Rules with Numerical Interval Conditions[J]. Intelligent Data Analysis, 2012, 16(1): 25-47.
doi: 10.3233/IDA-2011-0509
[19] Rastogi R, Shim K.Mining Optimized Association Rules with Categorical and Numeric Attributes[J]. IEEE Transactions on Knowledge & Data Engineering, 2002, 14(1): 29-50.
doi: 10.1109/ICDE.1998.655813
[20] Biba M, Esposito F, Ferilli S, et al.Unsupervised Discretization Using Kernel Density Estimation[C]// Proceedings of the 2017 International Joint Conference on Artificial Intelligence, Hyderabad, India. 2008: 696-701.
[21] Schmidberger G, Frank E.Unsupervised Discretization Using Tree-based Density Estimation[C]//Proceedings of the 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, Porto, Portugal. 2005.
[22] Shanmugapriya M, Nehemiah H K, Bhuvaneswaran R S, et al.Unsupervised Discretization: An Analysis of Classification Approaches for Clinical Datasets[J]. Research Journal of Applied Sciences Engineering & Technology, 2017, 14(2): 67-72.
doi: 10.19026/rjaset.14.3991
[23] Paninski L.Estimation of Entropy and Mutual Information[J]. Neural Computation, 2006, 15(6): 1191-1253.
doi: 10.1162/089976603321780272
[24] Ferguson T S.A Bayesian Analysis of Some Nonparametric Problems[J]. Annals of Statistics, 1973, 1(2): 209-230.
[25] Teh Y W, Jordan M I, Beal M J, et al.Hierarchical Dirichlet Processes[J]. Journal of the American Statistical Association, 2006, 101(476): 1566-1581.
[1] Xie Wang, Wang Lizhen, Chen Hongmei, Zeng Lanqing. Identifying Relationship Between Pollution Sources and Cancer Cases with Spatial Ordered Pair Patterns[J]. 数据分析与知识发现, 2021, 5(2): 14-31.
[2] Yong Zhang,Shuqing Li,Yongshang Cheng. Mining Algorithm for Weighted Association Rules Based on Frequency Effective Length[J]. 数据分析与知识发现, 2019, 3(7): 85-93.
[3] Quan Lu,Anqi Zhu,Jiyue Zhang,Jing Chen. Research on User Information Requirement in Chinese Network Health Community: Taking Tumor-forum Data of Qiuyi as an Example[J]. 数据分析与知识发现, 2019, 3(4): 22-32.
[4] Dongmei Mu,Hui Fa,Ping Wang,Jing Sun. Research on Disease Risk Factors on Structural Equation Model[J]. 数据分析与知识发现, 2019, 3(4): 80-89.
[5] Li Yongnan. Using Bayes Theory to Classify Counter Terrorism Intelligence[J]. 数据分析与知识发现, 2018, 2(10): 9-14.
[6] Mu Dongmei,Wang Ping,Zhao Danning. Reducing Data Dimension of Electronic Medical Records: An Empirical Study[J]. 数据分析与知识发现, 2018, 2(1): 88-98.
[7] Hu Zhongyi,Wang Chaoqun,Wu Jiang. Identifying Phishing Websites with Multiple Online Data Sources[J]. 数据分析与知识发现, 2017, 1(6): 47-55.
[8] Mu Dongmei,Ren Ke. Discovering Knowledge from Electronic Medical Records with Three Data Mining Algorithms[J]. 现代图书情报技术, 2016, 32(6): 102-109.
[9] Li Feng,Li Shu’ning,Yu Jing. A Department Oriented Library Usage Data System for Graduates[J]. 现代图书情报技术, 2016, 32(5): 99-103.
[10] Zhao Jingxian. Detect of Internet Fake Public Opinion Based on Decision Tree[J]. 现代图书情报技术, 2015, 31(6): 78-84.
[11] He Jianmin, Wang Zhe. The Pedigree Method to Mine Influential Clusters of Topic Information in Social Network[J]. 现代图书情报技术, 2015, 31(5): 65-72.
[12] Huang Wenbin, Xu Shanchuan, Ma Long, Wang Jun. Analysis of Mobile User Behaviors with Telecommunication Data[J]. 现代图书情报技术, 2015, 31(5): 80-87.
[13] Hao Mei, Wang Daoping. Mining Customer Focus Features from Product Reviews Oriented Supply Chain[J]. 现代图书情报技术, 2014, 30(4): 65-70.
[14] Sun Hongfei, Hou Wei. Application of Improved TFIDF Algorithm in Mining Potential Cooperation Relationship[J]. 现代图书情报技术, 2014, 30(10): 84-92.
[15] Li Gaohu, Gao Song, Tang Xiaoxin, Cao Hongbing, Tang Qiuhong. Design and Implementation of New Books Noting Personalized Recommendation System Based on Circulation Logs[J]. 现代图书情报技术, 2012, 28(6): 89-93.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn