|
|
Evaluating PU Learning Based on Associative Classification Algorithm |
Yang Jianlin(), Liu Yang |
School of Information Management, Nanjing University, Nanjing 210023, China Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023, China |
|
|
Abstract [Objective] We examine the PU learning with the associative classification algorithm CBA. [Methods] First, we categorized α% of positive examples as unidentified positive examples, which were used to construct the corpus along with negative samples. Then, we classified examples based on all positive class association rules. Finally, we evaluated the reliability of class association rules with relative confidence. [Results] We used 0%, 30%, 60%, and 90% as the values of α. Compared to CBA, the AUC of the proposed PU learning algorithm were increased by 6.21%、11.15%、13.50% and 16.56%. Compared to POSC4.5, the AUC increased by 11.27%、15.03%、12.22%, and 7.37%. [Limitations] We did not modify the confidence of the class association rules based on the estimated proportion of positive examples. We found that the classification accuracy of the proposed PU learning algorithm gradually decreased while the value of α increased. We did not investigate the redundant rules of the CBA algorithm. [Conclusions] The proposed PU learning algorithm did better jobs than CBA and POSC4.5 algorithms.
|
Received: 12 June 2017
Published: 27 November 2017
|
|
[1] |
Denis F.PAC Learning from Positive Statistical Queries[A]// Algorithmic Learning Theory[M]. Springer Berlin Heidelberg, 1998: 112-126.
|
[2] |
潘世瑞, 张阳, 李雪, 等. 针对不确定正例和未标记学习的最近邻算法[J]. 计算机科学与探索, 2010, 4(9): 769-779.
doi: 10.3778/j.issn.1673-9418.2010.09.001
|
[2] |
(Pan Shirui, Zhang Yang, Li Xue, et al.Nearest Neighbor Algorithm for Positive and Unlabeled Learning with Uncertainty[J]. Journal of Frontiers of Computer Science and Technology, 2010, 4(9): 769-779.)
doi: 10.3778/j.issn.1673-9418.2010.09.001
|
[3] |
Schölkopf B, Platt J C, Shawe-Taylor J, et al.Estimating the Support of a High-dimensional Distribution[J]. Neural Computation, 2001, 13(7): 1443-1471.
doi: 10.1162/089976601750264965
pmid: 11440593
|
[4] |
Yu H, Han J, Chang K C C. PEBL: Positive Example Based Learning for Web Page Classification Using SVM[C]// Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2002: 239-248.
|
[5] |
何佳珍. 不确定数据的PU学习贝叶斯分类器研究[D]. 咸阳: 西北农林科技大学, 2012.
|
[5] |
(He Guizhen.Bayesian Classification for Positive Unlabeled Learning with Uncertainty[D]. Xianyang: Northwest A&F University, 2012.)
|
[6] |
张星. 不确定数据的PU学习决策树研究[D]. 咸阳: 西北农林科技大学, 2012.
|
[6] |
(Zhang Xing.Research on Decision Tree for Mining Uncertain Data with PU-learning[D]. Xianyang: Northwest A&F University, 2012.)
|
[7] |
胡颢继. 基于数据分布和文本相似性的PU分类技术[D]. 上海: 华东师范大学, 2014.
|
[7] |
(Hu Haoji.A Classification Method for PU Problem Based on Data Distribution and Text Similarity[D]. Shanghai: East China Normal University, 2014.)
|
[8] |
张邦佐. 基于正例和无标记样例学习研究[D]. 长春: 吉林大学, 2009.
|
[8] |
(Zhang Bangzuo.A Study on Learning from Positive and Unlabeled Examples[D]. Changchun: Jilin University, 2009.)
|
[9] |
Liu B, Lee W S, Yu P S, et al.Partially Supervised Classification of Text Documents[C]// Proceedings of the 19th International Conference on Machine Learning. 2002.
|
[10] |
Fung G P C, Yu J X, Lu H, et al. Text Classification Without Negative Examples Revisit[J]. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(1): 6-20.
doi: 10.1109/TKDE.2006.16
|
[11] |
许震. 基于KL距离的半监督分类算法[D]. 上海: 复旦大学, 2010.
|
[11] |
(Xu Zhen.Semi-supervised Classification Based on KL Divergence[D]. Shanghai: Fudan University, 2010.)
|
[12] |
Györfi L, Gyorfi Z, Vajda I.Bayesian Decision with Rejection[J]. Problems of Control and Information Theory, 1979, 8(5-6): 445-452.
|
[13] |
Chawla N V, Karakoulas G.Learning from Labeled and Unlabeled Data: An Empirical Study Across Techniques and Domains[J]. Journal of Artificial Intelligence Research, 2005, 23: 331-366.
doi: 10.1613/jair.1509
|
[14] |
Jain S, White M, Radivojac P.Estimating the Class Prior and Posterior from Noisy Positives and Unlabeled Data[C]// Proceedings of the 30th Annual Conference on Neural Information Processing Systems. 2016.
|
[15] |
Natarajan N.Learning with Positive and Unlabeled Examples[D]. Austin: The University of Texas at Austin, 2015.
|
[16] |
Lee W S, Liu B.Learning with Positive and Unlabeled Examples Using Weighted Logistic Regression[C]// Proceedings of the 20th International Conference on Machine Learning. 2003.
|
[17] |
Letouzey F, Denis F, Gilleron R.Learning from Positive and Unlabeled Examples[A]// Algorithmic Learning Theory[M]. Springer Berlin Heidelberg, 2000: 71-85.
|
[18] |
De Comité F, Denis F, Gilleron R, et al.Positive and Unlabeled Examples Help Learning[A]// Algorithmic Learning Theory[M]. Springer Berlin Heidelberg, 1999: 219-230.
|
[19] |
Liu B, Dai Y, Li X, et al.Building Text Classifiers Using Positive and Unlabeled Examples[C]// Proceedings of the 3rd IEEE International Conference on Data Mining. IEEE, 2003: 179-186.
|
[20] |
Liu B, Hsu W, Ma Y.Integrating Classification and Association Rule Mining[C]// Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining. 1998.
|
[21] |
黄再祥, 周忠眉, 何田中, 等. 改进的多类不平衡数据关联分类算法[J]. 模式识别与人工智能, 2015, 28(10): 922-929.
doi: 10.16451/j.cnki.issn1003-6059.201510007
|
[21] |
(Huang Zaixiang, Zhou Zhongmei, He Tianzhong, et al.Improved Associative Classification Algorithm for Multiclass Imbalanced Datasets[J]. Pattern Recognition and Artificial Intelligence, 2015, 28(10): 922-929.)
doi: 10.16451/j.cnki.issn1003-6059.201510007
|
[22] |
李硕. PU学习场景下代价敏感数据流分类算法研究[D]. 咸阳:西北农业科技大学, 2015.
|
[22] |
(Li Shuo.Research on Algorithm of Cost-sensitive Data Stream Classification Under PU Learning Scenario [D]. Xianyang: Northwest A&F University, 2015.)
|
[23] |
Dong G, Zhang X, Wong L, et al.CAEP: Classification by Aggregating Emerging Patterns[A]// Discovery Science[M]. Springer Berlin Heidelberg, 1999: 30-42.
|
[24] |
UCI Machine Learning Repository[EB/OL]. [2017-03-26]. .
|
[25] |
LUCS-KDD Implementation of CBA[EB/OL]. [2017-03-26]. .
|
[26] |
Machine Learning Group at the University of Waikato. Weka [EB/OL]. [2017-04-12]. .
|
[27] |
LUCS-KDD DN Software [EB/OL]. [2017-03-26]..
|
[28] |
Fawcett T.An Introduction to ROC Analysis[J]. Pattern Recognition Letters, 2006, 27(8): 861-874.
doi: 10.1016/j.patrec.2005.10.010
|
[29] |
刘红梅. 基于关联规则的分类方法初探[J]. 电脑知识与技术, 2009, 5(3): 535-536.
doi: 10.3969/j.issn.1009-3044.2009.03.009
|
[29] |
(Liu Hongmei.Research of Association Rule Classification[J]. Computer Knowledge and Technology, 2009, 5(3): 535-536.)
doi: 10.3969/j.issn.1009-3044.2009.03.009
|
[30] |
Zaïane O, Antonie M L.On Pruning and Tuning Rules for Associative Classifiers[A]// Knowledge-based Intelligent Information and Engineering Systems[M]. Springer Berlin Heidelberg, 2005.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|