Please wait a minute...
Data Analysis and Knowledge Discovery  2017, Vol. 1 Issue (11): 12-18    DOI: 10.11925/infotech.2096-3467.2017.0544
Orginal Article Current Issue | Archive | Adv Search |
Evaluating PU Learning Based on Associative Classification Algorithm
Yang Jianlin(), Liu Yang
School of Information Management, Nanjing University, Nanjing 210023, China
Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023, China
Download: PDF (444 KB)   HTML ( 5
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] We examine the PU learning with the associative classification algorithm CBA. [Methods] First, we categorized α% of positive examples as unidentified positive examples, which were used to construct the corpus along with negative samples. Then, we classified examples based on all positive class association rules. Finally, we evaluated the reliability of class association rules with relative confidence. [Results] We used 0%, 30%, 60%, and 90% as the values of α. Compared to CBA, the AUC of the proposed PU learning algorithm were increased by 6.21%、11.15%、13.50% and 16.56%. Compared to POSC4.5, the AUC increased by 11.27%、15.03%、12.22%, and 7.37%. [Limitations] We did not modify the confidence of the class association rules based on the estimated proportion of positive examples. We found that the classification accuracy of the proposed PU learning algorithm gradually decreased while the value of α increased. We did not investigate the redundant rules of the CBA algorithm. [Conclusions] The proposed PU learning algorithm did better jobs than CBA and POSC4.5 algorithms.

Key wordsAssociative Classification      PU Learning      CBA Algorithm     
Received: 12 June 2017      Published: 27 November 2017
ZTFLH:  TP311 G35  

Cite this article:

Yang Jianlin,Liu Yang. Evaluating PU Learning Based on Associative Classification Algorithm. Data Analysis and Knowledge Discovery, 2017, 1(11): 12-18.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2017.0544     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2017/V1/I11/12

数据集名称 样本数量 正、负样本数量比
adult 48 842 11687:37155
breast 699 241:458
cylBands 540 228:312
hepatitis 155 32:123
horseColic 368 136:232
mushroom 8 124 3916:4208
pima 768 268:500
ticTacoe 958 332:626
bank 41 188 4640:36548
default 30 000 6636:23364
数据集
名称
样本
数量
正、负样本
数量比
特征数量
分箱前 分箱后
adult 48 842 11687:37155 15 97
bank 41 188 4640:36548 21 84
breast 699 241:458 11(含1个无用特征) 20
cylBands 540 228:312 40(含4个无用特征) 124
default 30 000 6636:23364 24 132
hepatitis 155 32:123 20 56
horseColic 368 136:232 28 85
mushroom 8 124 3916:4208 23 90
pima 768 268:500 9 38
ticTacoe 958 332:626 10 29
数据集
名称
α = 0 α = 0.3 α = 0.6 α = 0.9
CBA
_AUC
POSC4.5
_AUC
PU
_AUC
CBA
_AUC
POSC4.5
_AUC
PU
_AUC
CBA
_AUC
POSC4.5
_AUC
PU
_AUC
CBA
_AUC
POSC4.5
_AUC
PU
_AUC
adult 0.872 0.647 0.863 0.857 0.655 0.870 0.857 0.696 0.839 0.851 0.630 0.683
bank 0.723 0.829 0.753 0.721 0.788 0.773 0.715 0.819 0.768 0.633 0.824 0.751
breast 0.933 0.945 0.966 0.938 0.907 0.965 0.955 0.907 0.963 0.953 0.664 0.8641
cylBands 0.747 0.593 0.820 0.627 0.5 0.802 0.602 0.602 0.757 0.512 0.547 0.703
default 0.703 0.604 0.735 0.698 0.5 0.738 0.694 0.548 0.731 0.661 0.540 0.701
hepatitis 0.615 0.742 0.800 0.600 0.806 0.790 0.625 0.5 0.775 0.554 0.511 0.747
horseColic 0.749 0.790 0.887 0.701 0.790 0.811 0.512 0.790 0.667 0.503 0.790 0.595
mushroom 1.0 0.998 0.999 0.999 0.989 0.996 0.987 0.946 0.982 0.876 0.983 0.781
pima 0.751 0.681 0.760 0.701 0.711 0.753 0.668 0.703 0.702 0.399 0.594 0.684
ticTacoe 1.0 0.719 0.915 0.810 0.5 0.905 0.599 0.5 0.832 0.534 0.5 0.635
提升 α = 0 α = 0.3 α = 0.6 α = 0.9
AUC平均增加量 6.21% 11.15% 13.50% 16.56%
提升 α = 0 α = 0.3 α = 0.6 α = 0.9
AUC平均增加量 11.27% 15.03% 12.22% 7.37%
[1] Denis F.PAC Learning from Positive Statistical Queries[A]// Algorithmic Learning Theory[M]. Springer Berlin Heidelberg, 1998: 112-126.
[2] 潘世瑞, 张阳, 李雪, 等. 针对不确定正例和未标记学习的最近邻算法[J]. 计算机科学与探索, 2010, 4(9): 769-779.
doi: 10.3778/j.issn.1673-9418.2010.09.001
[2] (Pan Shirui, Zhang Yang, Li Xue, et al.Nearest Neighbor Algorithm for Positive and Unlabeled Learning with Uncertainty[J]. Journal of Frontiers of Computer Science and Technology, 2010, 4(9): 769-779.)
doi: 10.3778/j.issn.1673-9418.2010.09.001
[3] Schölkopf B, Platt J C, Shawe-Taylor J, et al.Estimating the Support of a High-dimensional Distribution[J]. Neural Computation, 2001, 13(7): 1443-1471.
doi: 10.1162/089976601750264965 pmid: 11440593
[4] Yu H, Han J, Chang K C C. PEBL: Positive Example Based Learning for Web Page Classification Using SVM[C]// Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2002: 239-248.
[5] 何佳珍. 不确定数据的PU学习贝叶斯分类器研究[D]. 咸阳: 西北农林科技大学, 2012.
[5] (He Guizhen.Bayesian Classification for Positive Unlabeled Learning with Uncertainty[D]. Xianyang: Northwest A&F University, 2012.)
[6] 张星. 不确定数据的PU学习决策树研究[D]. 咸阳: 西北农林科技大学, 2012.
[6] (Zhang Xing.Research on Decision Tree for Mining Uncertain Data with PU-learning[D]. Xianyang: Northwest A&F University, 2012.)
[7] 胡颢继. 基于数据分布和文本相似性的PU分类技术[D]. 上海: 华东师范大学, 2014.
[7] (Hu Haoji.A Classification Method for PU Problem Based on Data Distribution and Text Similarity[D]. Shanghai: East China Normal University, 2014.)
[8] 张邦佐. 基于正例和无标记样例学习研究[D]. 长春: 吉林大学, 2009.
[8] (Zhang Bangzuo.A Study on Learning from Positive and Unlabeled Examples[D]. Changchun: Jilin University, 2009.)
[9] Liu B, Lee W S, Yu P S, et al.Partially Supervised Classification of Text Documents[C]// Proceedings of the 19th International Conference on Machine Learning. 2002.
[10] Fung G P C, Yu J X, Lu H, et al. Text Classification Without Negative Examples Revisit[J]. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(1): 6-20.
doi: 10.1109/TKDE.2006.16
[11] 许震. 基于KL距离的半监督分类算法[D]. 上海: 复旦大学, 2010.
[11] (Xu Zhen.Semi-supervised Classification Based on KL Divergence[D]. Shanghai: Fudan University, 2010.)
[12] Györfi L, Gyorfi Z, Vajda I.Bayesian Decision with Rejection[J]. Problems of Control and Information Theory, 1979, 8(5-6): 445-452.
[13] Chawla N V, Karakoulas G.Learning from Labeled and Unlabeled Data: An Empirical Study Across Techniques and Domains[J]. Journal of Artificial Intelligence Research, 2005, 23: 331-366.
doi: 10.1613/jair.1509
[14] Jain S, White M, Radivojac P.Estimating the Class Prior and Posterior from Noisy Positives and Unlabeled Data[C]// Proceedings of the 30th Annual Conference on Neural Information Processing Systems. 2016.
[15] Natarajan N.Learning with Positive and Unlabeled Examples[D]. Austin: The University of Texas at Austin, 2015.
[16] Lee W S, Liu B.Learning with Positive and Unlabeled Examples Using Weighted Logistic Regression[C]// Proceedings of the 20th International Conference on Machine Learning. 2003.
[17] Letouzey F, Denis F, Gilleron R.Learning from Positive and Unlabeled Examples[A]// Algorithmic Learning Theory[M]. Springer Berlin Heidelberg, 2000: 71-85.
[18] De Comité F, Denis F, Gilleron R, et al.Positive and Unlabeled Examples Help Learning[A]// Algorithmic Learning Theory[M]. Springer Berlin Heidelberg, 1999: 219-230.
[19] Liu B, Dai Y, Li X, et al.Building Text Classifiers Using Positive and Unlabeled Examples[C]// Proceedings of the 3rd IEEE International Conference on Data Mining. IEEE, 2003: 179-186.
[20] Liu B, Hsu W, Ma Y.Integrating Classification and Association Rule Mining[C]// Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining. 1998.
[21] 黄再祥, 周忠眉, 何田中, 等. 改进的多类不平衡数据关联分类算法[J]. 模式识别与人工智能, 2015, 28(10): 922-929.
doi: 10.16451/j.cnki.issn1003-6059.201510007
[21] (Huang Zaixiang, Zhou Zhongmei, He Tianzhong, et al.Improved Associative Classification Algorithm for Multiclass Imbalanced Datasets[J]. Pattern Recognition and Artificial Intelligence, 2015, 28(10): 922-929.)
doi: 10.16451/j.cnki.issn1003-6059.201510007
[22] 李硕. PU学习场景下代价敏感数据流分类算法研究[D]. 咸阳:西北农业科技大学, 2015.
[22] (Li Shuo.Research on Algorithm of Cost-sensitive Data Stream Classification Under PU Learning Scenario [D]. Xianyang: Northwest A&F University, 2015.)
[23] Dong G, Zhang X, Wong L, et al.CAEP: Classification by Aggregating Emerging Patterns[A]// Discovery Science[M]. Springer Berlin Heidelberg, 1999: 30-42.
[24] UCI Machine Learning Repository[EB/OL]. [2017-03-26]. .
[25] LUCS-KDD Implementation of CBA[EB/OL]. [2017-03-26]. .
[26] Machine Learning Group at the University of Waikato. Weka [EB/OL]. [2017-04-12]. .
[27] LUCS-KDD DN Software [EB/OL]. [2017-03-26]..
[28] Fawcett T.An Introduction to ROC Analysis[J]. Pattern Recognition Letters, 2006, 27(8): 861-874.
doi: 10.1016/j.patrec.2005.10.010
[29] 刘红梅. 基于关联规则的分类方法初探[J]. 电脑知识与技术, 2009, 5(3): 535-536.
doi: 10.3969/j.issn.1009-3044.2009.03.009
[29] (Liu Hongmei.Research of Association Rule Classification[J]. Computer Knowledge and Technology, 2009, 5(3): 535-536.)
doi: 10.3969/j.issn.1009-3044.2009.03.009
[30] Zaïane O, Antonie M L.On Pruning and Tuning Rules for Associative Classifiers[A]// Knowledge-based Intelligent Information and Engineering Systems[M]. Springer Berlin Heidelberg, 2005.
[1] Han Pu,Wang Peng. Simulating Public Opinion Evolution with Scale-Free Network Model and Infectious Disease Model[J]. 数据分析与知识发现, 2017, 1(10): 53-63.
[2] Shen Xuefeng,Ke Yongzhen,Yao Nan. Visualization of Coalition Data Based on Multi View Cooperation[J]. 数据分析与知识发现, 2017, 1(3): 21-28.
[3] Xing Meifeng, Xu Deshan. Design and Implementation of Visual Co-word and Cluster Analyzer[J]. 现代图书情报技术, 2011, 27(7/8): 62-67.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn