Evaluating PU Learning Based on Associative Classification Algorithm
Yang Jianlin(), Liu Yang
School of Information Management, Nanjing University, Nanjing 210023, China Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023, China
[Objective] We examine the PU learning with the associative classification algorithm CBA. [Methods] First, we categorized α% of positive examples as unidentified positive examples, which were used to construct the corpus along with negative samples. Then, we classified examples based on all positive class association rules. Finally, we evaluated the reliability of class association rules with relative confidence. [Results] We used 0%, 30%, 60%, and 90% as the values of α. Compared to CBA, the AUC of the proposed PU learning algorithm were increased by 6.21%、11.15%、13.50% and 16.56%. Compared to POSC4.5, the AUC increased by 11.27%、15.03%、12.22%, and 7.37%. [Limitations] We did not modify the confidence of the class association rules based on the estimated proportion of positive examples. We found that the classification accuracy of the proposed PU learning algorithm gradually decreased while the value of α increased. We did not investigate the redundant rules of the CBA algorithm. [Conclusions] The proposed PU learning algorithm did better jobs than CBA and POSC4.5 algorithms.
杨建林, 刘扬. 基于关联分类算法的PU学习研究[J]. 数据分析与知识发现, 2017, 1(11): 12-18.
Yang Jianlin,Liu Yang. Evaluating PU Learning Based on Associative Classification Algorithm. Data Analysis and Knowledge Discovery, 2017, 1(11): 12-18.
(Pan Shirui, Zhang Yang, Li Xue, et al.Nearest Neighbor Algorithm for Positive and Unlabeled Learning with Uncertainty[J]. Journal of Frontiers of Computer Science and Technology, 2010, 4(9): 769-779.)
doi: 10.3778/j.issn.1673-9418.2010.09.001
[3]
Schölkopf B, Platt J C, Shawe-Taylor J, et al.Estimating the Support of a High-dimensional Distribution[J]. Neural Computation, 2001, 13(7): 1443-1471.
doi: 10.1162/089976601750264965
pmid: 11440593
[4]
Yu H, Han J, Chang K C C. PEBL: Positive Example Based Learning for Web Page Classification Using SVM[C]// Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2002: 239-248.
[5]
何佳珍. 不确定数据的PU学习贝叶斯分类器研究[D]. 咸阳: 西北农林科技大学, 2012.
[5]
(He Guizhen.Bayesian Classification for Positive Unlabeled Learning with Uncertainty[D]. Xianyang: Northwest A&F University, 2012.)
[6]
张星. 不确定数据的PU学习决策树研究[D]. 咸阳: 西北农林科技大学, 2012.
[6]
(Zhang Xing.Research on Decision Tree for Mining Uncertain Data with PU-learning[D]. Xianyang: Northwest A&F University, 2012.)
[7]
胡颢继. 基于数据分布和文本相似性的PU分类技术[D]. 上海: 华东师范大学, 2014.
[7]
(Hu Haoji.A Classification Method for PU Problem Based on Data Distribution and Text Similarity[D]. Shanghai: East China Normal University, 2014.)
[8]
张邦佐. 基于正例和无标记样例学习研究[D]. 长春: 吉林大学, 2009.
[8]
(Zhang Bangzuo.A Study on Learning from Positive and Unlabeled Examples[D]. Changchun: Jilin University, 2009.)
[9]
Liu B, Lee W S, Yu P S, et al.Partially Supervised Classification of Text Documents[C]// Proceedings of the 19th International Conference on Machine Learning. 2002.
[10]
Fung G P C, Yu J X, Lu H, et al. Text Classification Without Negative Examples Revisit[J]. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(1): 6-20.
doi: 10.1109/TKDE.2006.16
[11]
许震. 基于KL距离的半监督分类算法[D]. 上海: 复旦大学, 2010.
[11]
(Xu Zhen.Semi-supervised Classification Based on KL Divergence[D]. Shanghai: Fudan University, 2010.)
[12]
Györfi L, Gyorfi Z, Vajda I.Bayesian Decision with Rejection[J]. Problems of Control and Information Theory, 1979, 8(5-6): 445-452.
[13]
Chawla N V, Karakoulas G.Learning from Labeled and Unlabeled Data: An Empirical Study Across Techniques and Domains[J]. Journal of Artificial Intelligence Research, 2005, 23: 331-366.
doi: 10.1613/jair.1509
[14]
Jain S, White M, Radivojac P.Estimating the Class Prior and Posterior from Noisy Positives and Unlabeled Data[C]// Proceedings of the 30th Annual Conference on Neural Information Processing Systems. 2016.
[15]
Natarajan N.Learning with Positive and Unlabeled Examples[D]. Austin: The University of Texas at Austin, 2015.
[16]
Lee W S, Liu B.Learning with Positive and Unlabeled Examples Using Weighted Logistic Regression[C]// Proceedings of the 20th International Conference on Machine Learning. 2003.
[17]
Letouzey F, Denis F, Gilleron R.Learning from Positive and Unlabeled Examples[A]// Algorithmic Learning Theory[M]. Springer Berlin Heidelberg, 2000: 71-85.
[18]
De Comité F, Denis F, Gilleron R, et al.Positive and Unlabeled Examples Help Learning[A]// Algorithmic Learning Theory[M]. Springer Berlin Heidelberg, 1999: 219-230.
[19]
Liu B, Dai Y, Li X, et al.Building Text Classifiers Using Positive and Unlabeled Examples[C]// Proceedings of the 3rd IEEE International Conference on Data Mining. IEEE, 2003: 179-186.
[20]
Liu B, Hsu W, Ma Y.Integrating Classification and Association Rule Mining[C]// Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining. 1998.
(Huang Zaixiang, Zhou Zhongmei, He Tianzhong, et al.Improved Associative Classification Algorithm for Multiclass Imbalanced Datasets[J]. Pattern Recognition and Artificial Intelligence, 2015, 28(10): 922-929.)
doi: 10.16451/j.cnki.issn1003-6059.201510007
[22]
李硕. PU学习场景下代价敏感数据流分类算法研究[D]. 咸阳:西北农业科技大学, 2015.
[22]
(Li Shuo.Research on Algorithm of Cost-sensitive Data Stream Classification Under PU Learning Scenario [D]. Xianyang: Northwest A&F University, 2015.)
[23]
Dong G, Zhang X, Wong L, et al.CAEP: Classification by Aggregating Emerging Patterns[A]// Discovery Science[M]. Springer Berlin Heidelberg, 1999: 30-42.
(Liu Hongmei.Research of Association Rule Classification[J]. Computer Knowledge and Technology, 2009, 5(3): 535-536.)
doi: 10.3969/j.issn.1009-3044.2009.03.009
[30]
Zaïane O, Antonie M L.On Pruning and Tuning Rules for Associative Classifiers[A]// Knowledge-based Intelligent Information and Engineering Systems[M]. Springer Berlin Heidelberg, 2005.