Data Analysis and Knowledge Discovery

Select

Data Archive for Research Projects in Population Health

Wu Sizhu, Qian Qing, Zhou Wei, Zhong Ming, Wang Anran, Xiu Xiaolei, Gou Huan, Li Zanmei, Li Jiao, Fang An

Data Analysis and Knowledge Discovery. 2020, 4(12): 2-13. https://doi.org/10.11925/infotech.2096-3467.2020.0954

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This study focuses on the design and implementation of the Population Health Data Archive (PHDA), aiming to support data curation of research projects supported by the government. [Methods] First, we analyzed the data curation characteristics of research projects on population health. Then, we constructed a data archive for their urgent needs. Our system includes flexible and scalable framework, as well as user friendly functional modules. [Results] The PHDA finished the tasks of project registration, data collection, big data high-speed transmission, security preservation, distribution of unique dataset identifiers, effective storage, access control and voucher issuance. In 2019, our system administrated 292 datasets for 14 projects from the National Special Program on Basic Works for Science and Technology. [Limitations] The PHDA could be optimized with more data semantics and deep learning technologies (i.e., intelligent data analysis services). [Conclusions] The PHDA could effectively curate and disseminate shared research data in the field of national population health.

Select

k-Anonymity Algorithm of Multi-Branch-Tree Forest Based on Recognition Rate

Chen Xianlai, Luo Xiao, Liu Li, Li Zhongmin, An Ying

Data Analysis and Knowledge Discovery. 2020, 4(12): 14-25. https://doi.org/10.11925/infotech.2096-3467.2020.0952

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper tries to improve the efficiency of k-anonymity algorithm and the quality of published data. [Methods] Based on the recognition rates and multi-branch-tree forest structure, we designed a new k-anonymous algorithm (MFBRR). It conducted bottom-up reviews of data according to properties of the generalization tree, and calculated the recognition rates. Then, we selected the target leaf nodes to prune the tree, which reduced the information loss. Finally, the MFBRR-γ algorithm was proposed based on parallel computing and multi-thread processing. [Results] We evaluated our algorithms with hierarchical precision and operation time using the “Adult” data sets. The hierarchical precisions of MFBRR and MFBRR-γ were 0.97 and 0.88. It took the MFBRR and MFBRR-γ algorithms 1 457 minutes and 12.08 minutes (γ=100) to process 30,000 data sets. The MFBRR algorithm achieved hierarchical precision of 0.93 with health care data. [Limitations] We only examined our models with two data sets. [Conclusions] The proposed algorithms could reduce the information loss due to anonymity and improve the quality of published data.

Select

A Review of Medical Decision Supports Based on Knowledge Graph

Zhu Chaoyu, Liu Lei

Data Analysis and Knowledge Discovery. 2020, 4(12): 26-32. https://doi.org/10.11925/infotech.2096-3467.2020.0953

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper systematically reviews the supporting applications for medical decisions based on knowledge graphs, aiming to expand similar interdisciplinary research. [Coverage] A total of 39 articles were retrieved from computer science conferences, as well as Web of Science with keywords of “knowledge graph reasoning” and “medical decision support”. [Methods] We reviewed the developments of medical decision support, from the perspectives of traditional and evidence-based medicine, as well as the computer and knowledge graph assisted systems. [Results] The medicine knowledge graph and reasoning significantly changed medical decision support systems, which also alleviated stress facing physicians, improved diagnosis efficiency, and reduced misdiagnosis. [Limitations] This article did not provide in-depth analysis of the reviewed models. [Conclusions] Medical knowledge graph is the “brain” of clinical decision support system, and knowledge graph reasoning helps the brain utilize relevant knowledge. We need to construct more comprehensive medical knowledge graphs, and improve their reasoning algorithms.

Select

Extracting Clinical Scale Information and Identifying Trial Cohorts with Semantic Alignment

Yang Lin, Huang Xiaoshuo, Wang Jiayang, Li Jiao

Data Analysis and Knowledge Discovery. 2020, 4(12): 33-44. https://doi.org/10.11925/infotech.2096-3467.2020.0951

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This study develops a method to extract clinical scale information based on semantic alignment, aiming to identify the potential cohort and improve the data-driven clinical research. [Methods] First, we analyzed the features of National Institutes of Health Stroke Scale (NIHSS) with clinical trials and real-world electronic medical records. Then, we proposed an extraction method for clinical scale information based on semantic alignment. Finally, we examined our model with data from ClinicalTrials.gov and open electronic medical record dataset MIMIC-III. [Results] The F1 values of the NIHSS total score and item scores of the extracted contents were 0.953 5 and 0.926 7. We identified patients who met NIHSS criteria effectively. [Limitations] More research is needed to examine this method with other clinical scales and real-world trial recuriment scenario. [Conclusions] The proposed method could effectively address the issue of semantic consistency facing clinical scale information.

Select

Analyzing Sentiments and Dissemination of Misinformation on Public Health Emergency

Zhang Yipeng,Ma Jingdong

Data Analysis and Knowledge Discovery. 2020, 4(12): 45-54. https://doi.org/10.11925/infotech.2096-3467.2020.0959

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper examines mis-information on public health emergency (i.e., the COVID-19 epidemic), aiming to reveal the public’s sentiments on mis-information and the latter’s dissemination patterns. [Methods] We retrieved our data from Sina Weibo and categorized the relevant microblog posts using machine learning techniques. Then, we extracted the post topics with LDA model and decided the emotional polarity of comments using dictionary method. Finally, we used T-test to compare the number of comments, shares and likes received by mis-information posts with different sentiments. [Results] We found that 46.28% of the retrieved blogs had mis-information. 59.32% of the posts with mis-information and 54.49% of the posts with accurate information yielded negative emotion among readers. On average, the misinformation posts with negative sentiments received more comments, shares and likes than those with positive sentiments (2.26, 2.68 and 3.29). [Limitations] We only examined COVID-19 related posts and did not investigate the dissemination of accurate information. [Conclusions] Public health emergency generates much mis-information. The sentiments of misinformation readers are more negative than those of normal information. Weibo posts with misinformation and negative sentiments attract more online participation.

Select

Recommending Knowledge for Online Health Community Users Based on Fuzzy Cognitive Map

Li He,Liu Jiayu,Shen Wang,Liu Rui,Jin Shuaiqi

Data Analysis and Knowledge Discovery. 2020, 4(12): 55-67. https://doi.org/10.11925/infotech.2096-3467.2020.0175

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper constructs a fuzzy cognitive map model, aiming to recommend context-driven knowledge for users of online health communities. [Methods] First, we extracted keywords from user comments and used them as concept nodes of the proposed model. Then, we calculated the absolute values of the weight relationship between concept nodes based on the similarity of keyword co-occurrence. Third, we determined the semantic relationship among the keywords through literature reviews and expert collaborations. Finally, we built the fuzzy cognitive map and recommended disease related knowledge using the change of state values among nodes. [Results] Our new model’s precision, recall and F-measure were 0.286, 0.667 and 0.400 respectively. [Limitations] The amount of user comments need to be increased, which will improve the model's performance. [Conclusions] The proposed model optimizes the recommendation mechanism of online health communities and provides better knowledge for patients.

Select

Automatic Extraction of Traditional Music Terms of Intangible Cultural Heritage

Liu Liu,Qin Tianyun,Wang Dongbo

Data Analysis and Knowledge Discovery. 2020, 4(12): 68-75. https://doi.org/10.11925/infotech.2096-3467.2020.0400

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] Focus on the task of entity recognition of traditional music terms of intangible cultural heritage. [Methods] This research constructed a corpus of national intangible cultural heritage projects based on the China Intangible Cultural Heritage Network, and built an entity recognition framework on traditional music terms based on the CRF, LSTM, LSTM-CRF, and BERT. [Results] According to the performance comparison, the BERT model for recognition of traditional music terms had achieved a better result, with an average F1 value of 91.77%. [Limitations] This study only extract unique terms, and the training set is small. [Conclusions] The entity recognition model constructed by BERT is a valid model for automatically extracting traditional musical terms of intangible cultural heritage. It can provide a reliable reference for the related research of intangible cultural heritage.

Select

Matrix Factorization Algorithm with Weighted Heterogeneous Information Network

Wang Gensheng,Pan Fangzheng

Data Analysis and Knowledge Discovery. 2020, 4(12): 76-84. https://doi.org/10.11925/infotech.2096-3467.2020.0327

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper integrates the knowledge of weighted heterogeneous information network to the matrix decomposition algorithm, aiming to improve the quality of recommendation. [Methods] First, we constructed a heterogeneous information network, and calculated the weight of connection with the improved tanh function. Then, we chose the meta paths from the network and computed their weights based on information gains. Third, we decided the similarity of user interests to create a matrix, and integrated the matrix with our algorithm. [Results] We examined the proposed algorithm with the Hetrec2011-MovieLens-2k dataset. Compared with the traditional FunkSVD algorithm, the precision, recall and coverage of our algorithm increased by 4.4%, 5.4%, and 4.6%, while its root mean square error reduced by 0.06. [Limitations] The matrix decomposition algorithm could not process massive data efficiently, and we did not investigate the drifting issues of user interests. [Conclusions] The proposed algorithm could effectively generate recommendation results.

Select

Sentiment Analysis of Cross-Domain Product Reviews Based on Feature Fusion and Attention Mechanism

Qi Ruihua,Jian Yue,Guo Xu,Guan Jinghua,Yang Mingxin

Data Analysis and Knowledge Discovery. 2020, 4(12): 85-94. https://doi.org/10.11925/infotech.2096-3467.2020.0535

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper tries to address the issues of labelled data shortage, aiming to distinguish the weights of sentiment characteristics in cross-domain sentiment classification. [Methods] We proposed a sentiment classification model for cross-domain product reviews based on feature fusion representation and the attention mechanism. First, this model integrated Bert and cross-domain word vectors to generate cross-domain unified feature space. Then, it extracted the weights of global and local features through attention mechanism. [Results] We examined our model with public review data from Amazon and found the average accuracy of the proposed model was up-to 95.93%, which was 9.33% higher than the existing model. [Limitations] More research is needed to evaluate our model with large-scale multi-domain data sets. [Conclusions] The proposed model could effectively analyze sentiment information.

Select

Recommending Microblogs with User’s Interests and Multidimensional Trust

Han Kangkang,Xu Jianmin,Zhang Bin

Data Analysis and Knowledge Discovery. 2020, 4(12): 95-104. https://doi.org/10.11925/infotech.2096-3467.2020.0049

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper tries to improve microblog recommendation method with the trust relationship between microblog profiles and target users, aiming to improve the recommendation results. [Methods] First, the comprehensive trust between microblog users and target users is calculated by using the linear harmonic function of similarity, familiarity and influence. Then, the comprehensive trust degree is used as the adjustment factor to improve the content-based recommendation method. [Results] The F-Measure and DCG-Measure of the method was higher than those of the traditional ones. [Limitations] This method did not examine the indirect relationship among the non-adjacent users. [Conclusions] The proposed method could more effectively recommend microblogs.

Select

Improving Security Checks and Passenger Risk Evaluation with Classification of Airline Passengers

Feng Wengang,Jiang Zhaofeifan

Data Analysis and Knowledge Discovery. 2020, 4(12): 105-119. https://doi.org/10.11925/infotech.2096-3467.2020.0655

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This study tries to improve the efficiency of airport security checks and conducts dynamic analysis of risk evolutions, aiming to provide better services for airline passengers. [Methods] We constrcuted an index system for civil aviation passengers based on deep learning, which determines the weight of each index through quantitative analysis. Then, we utilized system dynamics to simulate the impacts of safety management measures on passenger risks. [Results] The proposed method could precisely diverge airline passengers based on risk analysis, which could reduce the waiting time at security checks. Increasing the social safety management, security measures and response coefficient by 30%, we could reduce the passenger ricks by 61.65%, 29.87% and 29.87%, respectively. [Limitations] Our study did not include events with high confidentiality. [Conclusions] The proposed model could help airports launch differentiated security check services, and evaluate the evolutions of passenger risks.

Select

Group Recommendation Algorithms Based on Implicit Representation Learning of Multi-attribute Ratings

Zhang Chunjin,Guo Shenghui,Ji Shujuan,Yang Wei,Yi Lei

Data Analysis and Knowledge Discovery. 2020, 4(12): 120-135. https://doi.org/10.11925/infotech.2096-3467.2020.0264

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper addresses the issues facing user representation learning due to the sparsity of their ratings, aiming to improve the performance of recommendation algorithm. [Methods] We proposed a neural network-based method to learn the implicit representation of multi-attribute ratings from user groups and individual items. Then, we conducted two group-oriented recommendations by matching their learned representations with preferences as well as calculating the attraction of each item. [Results] We examined our method with TripAdvisor data set and found the accuracy and time performance of the proposed algorithms were better than the typical multi-attribute ones and group ones. Compared to the personalized recommendation algorithm, the accuracies of our algorithms were slightly worse, but their online and offline running time was reduced by more than 30% and 50%, repectively. The recommendation results from user group based algorithm outperformed the item based one. [Limitations] We generated virtual groups based on clustering algorithm and their preferences were aggregated more effecitvely than the real world ones. [Conclusions] The proposed algorithms effectively improve the recommendation results.

Select

Identifying Traffic Events from Weibo with Knowledge Graph and Target Detection

Sun Xinrui,Meng Yu,Wang Wenle

Data Analysis and Knowledge Discovery. 2020, 4(12): 136-147. https://doi.org/10.11925/infotech.2096-3467.2020.0596

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper identifies traffic events from Weibo (microblog) posts with the help of knowledge graph and target detection techniques, aiming to address traffic management issues. [Methods] First, we constructed traffic knowledge graph and event evolution graph based on open data. Then, we identified traffic events from microblog texts. Third, we retrieved microblog images with target detection to further improve the recognition accuracy of three types of events. [Results] We examined our method with microblog data on Qingdao’s traffics in 2018. The precision of traffic event detection based on texts and images were 94.55% and 95.53%. [Limitations] More research is needed to reduce the manual construction of traffic knowledge graph, and improve the target detection algorithm. [Conclusions] The proposed method could help urban traffic management departments detect road incidents or traffic problems, and then facilitate their decision-makings.

Please choose a citation manager

Content to export

25 December 2020, Volume 4 Issue 12

模态框（Modal）标题

Please choose a citation manager

Content to export

25 December 2020, Volume 4 Issue 12