[Objective] The paper aims to detect the compliance of privacy policies at the semantic level by integrating legal and regulatory knowledge. [Methods] We constructed a compliance evaluation index system from the integrity and semantic conflict perspective based on the Information Security Technology—Personal Information Security Specification (GB/T 35273-2020) and annotated the corpus. Then, we used the K-BERT model embedded with a knowledge graph to build an integrity evaluation model and a consistency evaluation model to detect semantic conflicts. Finally, we analyzed the compliance of app privacy policies in 15 fields with the integrity and consistency evaluation models. [Results] We constructed a Chinese privacy policy corpus that passed the Kendall's W test, and the F1 Score of the integrity and consistency evaluation models reached 0.92 and 0.87, respectively. We analyzed 1762 app privacy policies and found that policies in the fields of Audio-Video Entertainment, Purchase Comparison, Financial Planning, Sports and Health, and Automotive are better in integrity, while those in the fields of Social Communication and Purchase Comparison are more semantically compliant with legal and regulatory requirements. [Limitations] The content in hyperlinks that may appear in a few privacy policies is ignored, which may cause bias in the compliance testy of some privacy policies. [Conclusions] The proposed model achieves the goal of automated analysis of privacy policy compliance in various fields, which is significant for China in enhancing the regulatory capacity for mobile apps handling user privacy data.
朱侯, 罗颖嘉, 陈梦蕾, 欧阳佳祥, 肖颖, 蔡伊南. 基于知识库增强深度学习模型的隐私政策合规性研究——从完整性与语义冲突角度*[J]. 数据分析与知识发现, 2024, 8(5): 46-58.
Zhu Hou, Luo Yingjia, Chen Menglei, Ouyang Jiaxiang, Xiao Ying, Cai Yinan. Analyzing Compliance of Privacy Policy with Knowledge-Enhanced Deep Learning Model: From the Perspective of Integrity and Semantic Conflict. Data Analysis and Knowledge Discovery, 2024, 8(5): 46-58.
(Li Yanshun. The Compliance Review and Improvement of China’s Mobile App Privacy Policy: A Text Review on 49 Cases of Privacy Policy[J]. Studies in Law and Business, 2019, 36(5): 26-39.)
(Guo Qingyue, Wu Dan. Research on Optimization of APP Privacy Policy Framework Based on Text Analysis[J]. Journal of Information Resources Management, 2021, 11(1): 17-29.)
(Liu Bailing, Xia Huimin, Li Yanhui, et al. An Empirical Study on User’s Mobile Payment Willingness from the Double Perspectives of Both Hygiene and Motivation[J]. Chinese Journal of Management, 2017, 14(4): 600-608.)
(Liang Xiaodan, Li Yinghao, Liu Fang. The Influence Mechanism of Privacy Policies on Consumers’ Willingness to Provide Information: Based on Moderating Effects of Information Sensitivity[J]. Management Review, 2018, 30(11): 97-107.)
(Zhu Hou, Zhang Mingxin, Lu Yonghe. An Empirical Study on Privacy Policy Reading Intention of Social Media Users[J]. Journal of the China Society for Scientific and Technical Information, 2018, 37(4): 362-371.)
(Zhang Yanfeng, Qiu Yi. Research on Compliance Measurement of Personal Information Protection Policies for Mobile Reading Applications in China[J]. Library and Information Service, 2021, 65(22): 35-43.)
doi: 10.13266/j.issn.0252-3116.2021.22.004
(Zhang Yue, Wang Jian, Zhu Qinghua. Research on Cognitive Influencing Factors Framework Model of Medical Interrogation APP Privacy Policy: Based on Grounded Theory[J]. Information Studies: Theory & Application, 2019, 42(6): 105-110.)
doi: 10.16353/j.cnki.1000-7490.2019.06.019
(Zhang Yue, Wang Jian, Yu Shu, et al. Research on the Influence of Information Representation on Privacy Policy of M-Health APP: Based on Cognitive Load Theory[J]. Library and Information Service, 2021, 65(11): 3-13.)
doi: 10.13266/j.issn.0252-3116.2021.11.001
(Xu Lei, Guo Xu. Practice Logic and Normative Path of Protecting Readers’ Personal Information in the Age of Big Data—From the Perspective of Privacy Policy of Book Apps[J]. Library Development, 2021(1): 74-83.)
(Ma Chengyu, Liu Qiankun. Research on the Privacy Policy’s Evaluation and Empirical Study of Mobile Health Applications[J]. Library and Information Service, 2020, 64(7): 46-55.)
doi: 10.13266/j.issn.0252-3116.2020.07.006
Shi Jing, Pan Ya. Evaluation Index System of Privacy Statement and Text Analysis of Network Application[J]. Modern Communication (Journal of Communication University of China), 2020, 42(3): 76-82.)
Zhu Ying. Research on Privacy Protection Policy of Mobile Apps in China—Based on the Analysis of 96 Mobile Apps[J]. Jinan Journal (Philosophy & Social Science Edition), 2017, 39(12): 107-114.)
(Yang Ruixian, Shen Jianing, Xu Fan, et al. Construction of Privacy Policy Evaluation Index System for Social Media APPs and Empirical Study[J]. Information Studies: Theory & Application, 2023, 46(1): 81-89.)
(Zhu Zhangying, Lu Yitian, Tang Zhushou, et al. Application Classification Based on Privacy Policy Terms and Machine Learning[J]. Communications Technology, 2020, 53(11): 2749-2757.)
[15]
Wilson S, Schaub F, Dara A A, et al. The Creation and Analysis of a Website Privacy Policy Corpus[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016: 1330-1340.
[16]
Oltramari A, Piraviperumal D, Schaub F, et al. PrivOnto: A Semantic Framework for the Analysis of Privacy Policies[J]. Semantic Web, 2018, 9(2): 185-203.
[17]
Sánchez D, Viejo A, Batet M. Automatic Assessment of Privacy Policies Under the GDPR[J]. Applied Sciences, 2021, 11(4): Article No.1762.
[18]
Harkous H, Fawaz K, Lebret R, et al. Polisis: Automated Analysis and Presentation of Privacy Policies Using Deep Learning[C]// Proceedings of the 27th USENIX Conference on Security Symposium. ACM, 2018: 531-548.
[19]
贾哲. 基于本体的隐私策略冲突检测研究[D]. 南京: 南京航空航天大学, 2012.
[19]
(Jia Zhe. Research on Ontology-Based Privacy Policy Conflict Detection[D]. Nanjing: Nanjing University of Aeronautics and Astronautics, 2012.)
[20]
雷永康. 中文隐私政策命名实体识别研究[D]. 西安: 西安电子科技大学, 2020.
[20]
(Lei Yongkang. Research on Named Entity Recognition of Chinese Privacy Policy[D]. Xi’an: Xidian University, 2020.)
[21]
Hosseini M B, Breaux T D, Slavin R, et al. Analyzing Privacy Policies Through Syntax-Driven Semantic Analysis of Information Types[J]. Information and Software Technology, 2021, 138: Article No.106608.
[22]
Elluri L, Pande Joshi K, Kotal A. Measuring Semantic Similarity across EU GDPR Regulation and Cloud Privacy Policies[C]// Proceedings of the 2020 IEEE International Conference on Big Data. 2020: 3963-3978.
[23]
Liu W J, Zhou P, Zhao Z, et al. K-BERT: Enabling Language Representation with Knowledge Graph[C]// Proceedings of the 34th AAAI Conference on Artificial Intelligence. 2020: 2901-2908.