Zhong Ming, Qian Qing, Zhou Wei, Wu Sizhu
Online available: 2025-07-04
[Objective] In view of the characteristics of centralized storage, data security risks, limited computing resources, and urgent user analysis and utilization needs of the National Population Health Data Center (NPHDC), this study explores the construction ideas suitable for the data enclave of NPHDC, so as to provide users with a more efficient, secure, and flexible data processing and analysis environment.[Methods] The types, characteristics, implementation mechanisms, and applicability of different scenarios of data enclaves were summarized. Combined with the data application characteristics of NPHDC, a big data analysis platform for NPHDC was built based on the virtual enclave method integrating of security enhancement, micro-isolation, and artificial intelligence technologies.[Results] The big data analysis platform supported services such as data review, data processing, data analysis and mining, and peer review of the data associated with the user's published papers in NPHDC. It has completed the review tasks of 32,000 datasets of more than 2,800 projects, more than 10,000 data analysis tasks, and more than 5000 data processing tasks, with a data leakage rate of 0% and a resource utilization rate of 80%.[Limitations] It is not possible to realise cross-institutional data sharing with decentralized storage, and it is necessary to explore data enclave research combining privacy-preserving technologies such as multi-party secure computing and federated learning in combination with the development of NPHDC.[Conclusions] It is of great significance to effectively solve the needs of safe sharing and collaborative analysis of population health data centralisation, which is of great significance for the security and sharing and utilisation of national population health scientific data.