Please wait a minute...
Data Analysis and Knowledge Discovery  2017, Vol. 1 Issue (12): 92-100    DOI: 10.11925/infotech.2096-3467.2017.0955
Orginal Article Current Issue | Archive | Adv Search |
Self-Explainable Reduction Method for Mixed Feature Data Modeling
Siwei Jiang1,2,Zhenping Xie1,2(),Meijie Chen1,2,Ming Cai3
1School of Digital Media, Jiangnan University, Wuxi 214122, China
2Jiangsu Key Laboratory of Media Design and Software Technology, Wuxi 214122, China
3Center of Informatization Development and Management, Jiangnan University, Wuxi 214122, China
Download: PDF(972 KB)   HTML ( 1
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper aims to mine the data with continuous numeric and label features. [Methods] We proposed a self-explainable reduction model to represent the data. The proposed model used the new reduction objective to create adaptive discrete division for continuous data dimension. [Results] We examined the new model with standard datasets and found it had better performance than the existing ones. [Limitations] The computational efficiency of the proposed method was not very impressive, which cannot meet the demand of large-scale data mining. [Conclusions] The proposed model is innovative and practical to model the mixed feature data.

Key wordsMixed Feature Data      Self-Explainable Reduction      Data Modeling      Data Mining     
Received: 22 September 2017      Published: 29 December 2017

Cite this article:

Siwei Jiang,Zhenping Xie,Meijie Chen,Ming Cai. Self-Explainable Reduction Method for Mixed Feature Data Modeling. Data Analysis and Knowledge Discovery, 2017, 1(12): 92-100.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2017.0955     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2017/V1/I12/92

[1] Agrawal R, Imieliński T, Swami A.Mining Association Rules Between Sets of Items in Large Databases[C]// Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data. ACM, 1993: 207-216.
[2] Hsu C N, Huang H J, Wong T T.Why Discretization Works for Naive Bayesian Classifiers[C]// Proceedings of the 17th International Conference on Machine Learning. 2000: 399-406.
[3] García S, Luengo J, Sa?ez J A, et al. A Survey of Discretization Techniques: Taxonomy and Empirical Analysis in Supervised Learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(4): 734-750.
[4] Mahanta P, Ahmed H A, Kalita J K, et al.Discretization in Gene Expression Data Analysis: A Selected Survey[C]// Proceedings of the 2nd International Conference on Computational Science, Engineering and Information Technology. 2011: 69-75.
[5] Pearl J.Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference[J]. Computer Science Artificial Intelligence, 1988, 70(2): 1022-1027.
[6] Waugh N T, Muir D D.Improving the Life Cycle Management of Power Transformers Transforming Data to Life[C]//Proceedings of the 2015 SoutheastCon. IEEE, 2015: 1-7.
[7] Altaf W, Shahbaz M, Guergachi A.Applications of Association Rule Mining in Health Informatics: A Survey[J]. Artificial Intelligence Review, 2017, 47(3): 313-340.
[8] 阮光册, 夏磊. 基于关联规则的文本主题深度挖掘应用研究[J]. 现代图书情报技术, 2016(12): 50-56.
[8] (Ruan Gangce, Xia Lei.Mining Document Topics Based on Association Rules[J]. New Technology of Library and Information Service, 2016(12): 50-56.)
[9] 路永和, 曹利朝. 基于关联规则综合评价的图书推荐模型[J]. 现代图书情报技术, 2011(2): 81-86.
[9] (Lu Yonghe, Cao Lichao.Books Recommended Model Based on Association Rules Comprehensive Evaluation[J]. New Technology of Library and Information Service, 2011(2): 81-86.)
[10] Agrawal B R, Srikant R.A Fast Algorithm for Mining Association Rules[C]//Proceedings of the 20th International Conference on Very Large Data Bases. 1994: 21-30.
[11] Han J, Pei J, Yin Y.Mining Frequent Patterns Without Candidate Generation[J]. ACM SIGMOD Record, 2009, 29(2): 1-12.
[12] Zaki M J.Scalable Algorithms for Association Mining[J]. IEEE Transactions on Knowledge and Data Engineering, 2000, 12(3): 372-390.
[13] Qian G, Rao C R, Sun X, et al.Boosting Association Rule Mining in Large Datasets via Gibbs Sampling[J]. Proceedings of the National Academy of Sciences of the United States of America, 2016, 113(18): 4958-4963.
[14] Sheng G, Hou H, Jiang X, et al.A Novel Association Rule Mining Method of Big Data for Power Transformers State Parameters Based on Probabilistic Graph Model[J]. IEEE Transactions on Smart Grid, 2016(99): 1.
[15] Li J, Le T D, Liu L, et al. From Observational Studies to Causal Rule Mining[J]. ACM Transactions on Intelligent Systems and Technology, 2016, 7(2): Article No. 14.
[16] Song K, Lee K.Predictability-based Collective Class Association Rule Mining[J]. Expert Systems with Applications, 2017, 79: 1-7.
[17] Agbehadji I E, Fong S, Millham R.Wolf Search Algorithm for Numeric Association Rule Mining[C]//Proceedings of the 2016 IEEE International Conference on Cloud Computing and Big Data Analysis. IEEE, 2016: 146-151.
[18] Jorge A M, Azevedo P J.Optimal Leverage Association Rules with Numerical Interval Conditions[J]. Intelligent Data Analysis, 2012, 16(1): 25-47.
[19] Rastogi R, Shim K.Mining Optimized Association Rules with Categorical and Numeric Attributes[J]. IEEE Transactions on Knowledge & Data Engineering, 2002, 14(1): 29-50.
[20] Biba M, Esposito F, Ferilli S, et al.Unsupervised Discretization Using Kernel Density Estimation[C]// Proceedings of the 2017 International Joint Conference on Artificial Intelligence, Hyderabad, India. 2008: 696-701.
[21] Schmidberger G, Frank E.Unsupervised Discretization Using Tree-based Density Estimation[C]//Proceedings of the 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, Porto, Portugal. 2005.
[22] Shanmugapriya M, Nehemiah H K, Bhuvaneswaran R S, et al.Unsupervised Discretization: An Analysis of Classification Approaches for Clinical Datasets[J]. Research Journal of Applied Sciences Engineering & Technology, 2017, 14(2): 67-72.
[23] Paninski L.Estimation of Entropy and Mutual Information[J]. Neural Computation, 2006, 15(6): 1191-1253.
[24] Ferguson T S.A Bayesian Analysis of Some Nonparametric Problems[J]. Annals of Statistics, 1973, 1(2): 209-230.
[25] Teh Y W, Jordan M I, Beal M J, et al.Hierarchical Dirichlet Processes[J]. Journal of the American Statistical Association, 2006, 101(476): 1566-1581.
[1] Yong Zhang,Shuqing Li,Yongshang Cheng. Mining Algorithm for Weighted Association Rules Based on Frequency Effective Length[J]. 数据分析与知识发现, 2019, 3(7): 85-93.
[2] Quan Lu,Anqi Zhu,Jiyue Zhang,Jing Chen. Research on User Information Requirement in Chinese Network Health Community: Taking Tumor-forum Data of Qiuyi as an Example[J]. 数据分析与知识发现, 2019, 3(4): 22-32.
[3] Dongmei Mu,Hui Fa,Ping Wang,Jing Sun. Research on Disease Risk Factors on Structural Equation Model[J]. 数据分析与知识发现, 2019, 3(4): 80-89.
[4] Yongnan Li. Using Bayes Theory to Classify Counter Terrorism Intelligence[J]. 数据分析与知识发现, 2018, 2(10): 9-14.
[5] Dongmei Mu,Ping Wang,Danning Zhao. Reducing Data Dimension of Electronic Medical Records: An Empirical Study[J]. 数据分析与知识发现, 2018, 2(1): 88-98.
[6] Zhongyi Hu,Chaoqun Wang,Jiang Wu. Identifying Phishing Websites with Multiple Online Data Sources[J]. 数据分析与知识发现, 2017, 1(6): 47-55.
[7] Mu Dongmei,Ren Ke. Discovering Knowledge from Electronic Medical Records with Three Data Mining Algorithms[J]. 现代图书情报技术, 2016, 32(6): 102-109.
[8] Li Feng,Li Shu’ning,Yu Jing. A Department Oriented Library Usage Data System for Graduates[J]. 现代图书情报技术, 2016, 32(5): 99-103.
[9] Zhao Jingxian. Detect of Internet Fake Public Opinion Based on Decision Tree[J]. 现代图书情报技术, 2015, 31(6): 78-84.
[10] He Jianmin, Wang Zhe. The Pedigree Method to Mine Influential Clusters of Topic Information in Social Network[J]. 现代图书情报技术, 2015, 31(5): 65-72.
[11] Huang Wenbin, Xu Shanchuan, Ma Long, Wang Jun. Analysis of Mobile User Behaviors with Telecommunication Data[J]. 现代图书情报技术, 2015, 31(5): 80-87.
[12] Hao Mei, Wang Daoping. Mining Customer Focus Features from Product Reviews Oriented Supply Chain[J]. 现代图书情报技术, 2014, 30(4): 65-70.
[13] Sun Hongfei, Hou Wei. Application of Improved TFIDF Algorithm in Mining Potential Cooperation Relationship[J]. 现代图书情报技术, 2014, 30(10): 84-92.
[14] Li Gaohu, Gao Song, Tang Xiaoxin, Cao Hongbing, Tang Qiuhong. Design and Implementation of New Books Noting Personalized Recommendation System Based on Circulation Logs[J]. 现代图书情报技术, 2012, 28(6): 89-93.
[15] Tang Xiaoxin, Li Gaohu, Tang Qiuhong, Cao Hongbing, Gao Song. Design and Implementation of Personalized E-book Purchasing Recommendation System in University Libraries[J]. 现代图书情报技术, 2012, 28(3): 83-88.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn