Data Analysis and Knowledge Discovery

Select

An Overview of Online Medical and Health Research: Hot Topics, Theme Evolution and Research Content

Jiang Wu,Guanjun Liu,Xian Hu

Data Analysis and Knowledge Discovery. 2019, 3(4): 2-12. https://doi.org/10.11925/infotech.2096-3467.2018.1063

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper reviews the literature of the online health research, and discusses their methods, hot topics, theme evolution, and trends, aiming to provide reference for exploration in the future. [Coverage] The data are selected from Web of Science core collection by using the keywords of “Online medical” or “Online health”, finally 1,899 English articles were retrieved. [Methods] This paper mainly relies on the methods of bibliometrics, cluster analysis and vertical mapping analysis. [Results] Internet medical and health information, social media, online health community, electronic health records are the hot topics which are influenced by the diversification of Internet information and the convenience of online communication. [Limitations] This paper does not retrieve data from other databases and analyze the full text. The results might be biased. [Conclusions] There are still much to be explored in the field of online health. Combined with image recognition, deep learning and neural networks, more researches can be conducted in the future.

Select

The Inhibition Effect of Health Literacy on Health Risk Under the Internet Environment: An Empirical Study of Chronic Diseases Based on CHNS Data

Shijie Song,Yuxiang Zhao,Wenting Han,Qinghua Zhu

Data Analysis and Knowledge Discovery. 2019, 3(4): 13-21. https://doi.org/10.11925/infotech.2096-3467.2018.1026

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] To explore the impact of health literacy on health risks and its implications for national health improvement under the Internet environment. [Methods] First, we reviewed the inhibition effect of health literacy on health risk and posited the related hypotheses. Then we estimated the effect of Internet environment on health literacy by using counterfactual design and propensity score matching methods. Finally, the quantile regression approach was used to estimate the heterogeneity inhibition effect of health literacy on health risk. [Results] The results of empirical study showed that individuals exposed on Internet environment are more likely to have higher health literacy, and individuals with higher health literacy could inhibit the health risks of chronic diseases. [Limitations] Due to the constraints of secondary data, this study can't further investigate the micro cognitive mechanism on how the health literacy can inhibit the health risk. [Conclusions] Based on the empirical results, we proposed some political implications from multiple perspectives such as information environment improvement, health literacy training, and health risk identification to facilitate the realization of the “Healthy China” national strategy.

Select

Research on User Information Requirement in Chinese Network Health Community: Taking Tumor-forum Data of Qiuyi as an Example

Quan Lu,Anqi Zhu,Jiyue Zhang,Jing Chen

Data Analysis and Knowledge Discovery. 2019, 3(4): 22-32. https://doi.org/10.11925/infotech.2096-3467.2018.1153

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper constructs an information demand mining framework of Chinese online health community users adapted to the big data environment, and analyzes the user information needs by taking the data of tumor-forum as an example. [Methods] The Latent Semantic Indexing (LSI) model and MapReduce distributed text clustering technology were used in this framework to mine the user information needs. We use all the Q&A data (24,305 in total) from tumor-forum of Chinese online health community (qiuyi.cn) as the experimental data source. [Results] The proposed framework mines the five information needs and their proportions of the tumor users: treatment (43.3%), pathology and etiology (34.5%), examination (12.1%), postoperative (7.0%), prevention (3.1%), and top 20 keywords of these needs. The analysis shows the growth of each needs, and the significant difference between domestic users and foreign users. Gender differences are also significant, the male need treatment information most, while female need pathological and etiological information most. Age difference is large too, and the information needs of young people are the largest (83.79%), etc. [Limitations] There may be better threshold selection, and the medical thesaurus is not prefect. The analysis of information needs is not multidimensional. [Conclusions] The proposed framework is feasible. The paper found the trend of the demand distribution changes with year and the distribution of users information needs vary with age or gender.

Select

Selection of Users’ Behaviors Towards Different Topics of Microblog on Public Health Emergencies

Lu An,Yanping Liang

Data Analysis and Knowledge Discovery. 2019, 3(4): 33-41. https://doi.org/10.11925/infotech.2096-3467.2018.1037

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper aims to reveal the relationship between topics of microblog and user behaviors at different stages of public health emergencies. [Methods] We analyzed the behavioral patterns among different topics and within a specific topic. The LDA topic model improved by the relevance formula was employed to extract the topics of microblog entries on public health emergencies. The cosine distances between microblog topics and the numbers of retweets, comments, favorites, as well as those between each pair of behavior counts, were calculated to explore users’ behavior patterns towards the same or different topics. [Results] During public health emergencies, the evolutionary trends of users’ behaviors of retweets, comments, favorites are roughly similar. Significant correlations exist between the counts of three behaviors. The correlation coefficients between the counts of retweets and comments, those of comments and favorites, and those of retweets and favorites are 0.390, 0.274, 0.180 respectively. Microblogs related to the topics of event progress, government responses and knowledge dissemination are more likely to be commented on, while those related to the topics of public opinions and event measures are more likely to be retweeted. [Limitations] The universality of the conclusion is subject to the examination of other cases. [Conclusions] The tendency of user behaviors towards different types of topics is obviously unequal, which means different behaviors may happen among different topics and within a specific topic.

Select

Public Opinion Propagation and Evolution of Public Health Emergencies in Social Media Era: A Case Study of 2018 Vaccine Event

Lin Wang,Ke Wang,Jiang Wu

Data Analysis and Knowledge Discovery. 2019, 3(4): 42-52. https://doi.org/10.11925/infotech.2096-3467.2018.1061

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper tries to investigate the rules of public opinion propagation and evolution of public health emergencies and propose corresponding policies in the context of social media era. [Methods] Based on ELM, TAM and life cycle theory, the influencing factor model was established to explore the impact of information publishers, information content and information release time on public opinion propagation of public health emergencies. [Results] The empirical analysis showed that the right to discourse is mastered by different interest groups in different periods of public opinion development. Information with great timeliness and novelty spreads more widely. If official media show some subjectivity, the retweet volume of their tweets will be larger. [Limitations] Only one case was empirically analyzed and the compatibility of the model needs to be improved. [Conclusions] The model that comprehensively considers identity of information source, quality of information content and life cycle is a good way to explain the public opinion propagation and evolution rules of public health emergencies on social media platforms.

Select

Research on Weibo Opinion Leaders Identification and Analysis in Medical Public Opinion Incidents

Jiang Wu,Yinghui Zhao,Jiahui Gao

Data Analysis and Knowledge Discovery. 2019, 3(4): 53-62. https://doi.org/10.11925/infotech.2096-3467.2018.1069

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper aims to identify Weibo opinion leaders and study their influence in medical public opinion incidents. [Methods] This article integrates user personal attributes, network characteristics, behavioral characteristics and text features to construct a comprehensive index system to identify opinion leaders in different periods of medical public opinion incidents, and also use time difference correlation analysis to study the impact of the emotional tendency of opinion leaders on the public sentiment. [Results] Taking the 2018 vaccine event as a case, this paper verifies the effectiveness of the proposed opinion leader identification model. The results show that the medical public opinion hotspots and the types of opinion leaders differ in different periods, and the attitudes of opinion leaders have a guiding effect on the emotions of the general public. [Limitations] We only examined the performance on the proposed methods with the vaccine event data and the model generalization ability remains underdeveloped. [Conclusions] The multi-feature opinion leader identification method proposed in this paper can better discover potential opinion leaders among grassroots users compared with traditional evaluation indicators.

Select

Analysis of Knowledge Sharing Behavior of Medical Professional Users in Online Health Communities Based on Social Capital and Motivation Theory

Yuxin Peng,Zhaohua Deng,Jiang Wu

Data Analysis and Knowledge Discovery. 2019, 3(4): 63-70. https://doi.org/10.11925/infotech.2096-3467.2018.0666

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] Based on social capital theory and motivation theory, this paper constructs a research model and explores the factors affecting the willingness of medical professional users to share knowledge from multiple dimensions. [Methods] The questionnaire survey method is used to collect data online, SPSS20 is used for descriptive statistics and factor analysis, and the structural equation model is used to verify the hypotheses. [Results] The results are as follows: trust (β=0.10, P<0.05), shared vision(β=0.19, P<0.01), altruism(β=0.17, P<0.05), reputation(β=0.12, P<0.05), and altruism moderated by knowledge self-efficacy have a positive significant impact on the willingness to share knowledge (β=0.13, P<0.05), but social interaction, identification, reciprocity have no significant impact on the willing to share knowledge (P>0.05). [Limitations] There is also a need to dig deeper reasons for variables that are not significant. [Conclusions] The results provide a theoretical evidence for the research of medical professional user’s knowledge sharing intention, and provides references for online health community managers to formulate knowledge sharing incentive mechanism for medical professional users, which has certain practical significance.

Select

Text Sentiment Classification Based on Deep Belief Network

Qingqing Zhang,Xingshi He,Huimin Wang,Shengjun Meng

Data Analysis and Knowledge Discovery. 2019, 3(4): 71-79. https://doi.org/10.11925/infotech.2096-3467.2018.0516

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper focused on Chinese text sentiment classification based on deep belief network, especially the parameter selection and performance analysis of the network. [Methods] Chinese e-commercial reviews are as the object of the study, the unigram, bigram, POS, simple dependency label, sentiment score and triple dependency features are extracted and used as the input of deep belief network by setting different layers and different input numbers to compute the accuracy of sentiment classification. [Results] The results demonstrate that the triple dependency features as the input got better classification performance than the other features, but the number of hidden layers doesn’t have an effect on the classification accuracy. [Limitations] The methods aren’t conducted and verified on other deep learning models. [Conclusions] Deep learning has a good performance for sentiment analysis, but how to set up parameters still need to be further considered.

Select

Research on Disease Risk Factors on Structural Equation Model

Dongmei Mu,Hui Fa,Ping Wang,Jing Sun

Data Analysis and Knowledge Discovery. 2019, 3(4): 80-89. https://doi.org/10.11925/infotech.2096-3467.2018.0631

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper aims to use the structural equation model to analyze the objective index data and explore the risk factors related to the disease. [Methods] Based on literature research and linear correlation analysis, this paper extracts disease risk factors. Structural Equation modeling was used to analyze these risk factors. The disease diagnosis model was constructed using the classification regression tree (CART) algorithm, and risk factors were qualitatively and quantitatively evaluated and compared using diagnostic models. [Results] Nine risk factors related to disease were discovered. After quantitative evaluation, the indicators of disease risk factors diagnosis model based on Structural Equation Modeling were at a high level, and the overall performance was better. [Limitations] The amount of experimental data is limited, and the amount of data can be expanded to conduct experiments in the future. [Conclusions] Disease risk factors based on structural equation model can improve the early diagnosis rate of disease and can assist clinical decision-making.

Select

An Under-sampling Ensemble Classification Algorithm Based on Fuzzy C-Means Clustering for Imbalanced Data

Lianjie Xiao,Mengrui Gao,Xinning Su

Data Analysis and Knowledge Discovery. 2019, 3(4): 90-96. https://doi.org/10.11925/infotech.2096-3467.2018.0533

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper tries to solve the problem of the low accuracy of minority classification in the binary classification task due to class imbalance. [Methods] An under-sampling ensemble classification algorithm based on fuzzy c-means(FCM) clustering for imbalanced data is proposed. That is, the majority class samples are under-sampled based on FCM clustering, all these cluster center samples and all the minority samples are made up to a balance data set. We use the integrated learning algorithm based on Bagging to classify the balanced data sets. [Results] The Matlab simulation results of experiments on four imbalanced datasets show that the ECFCM algorithm improves Acc, AUC and F₁ by up to 5.75%, 13.84% and 7.54%. [Limitations] Some standard data sets are used to verify the effectiveness of ECFCM. When in a specific application, a targeted research on classification algorithm is needed. [Conclusions] The ECFCM algorithm performs good to a certain extent, which is conducive to improve the binary classification accuracy of the minority class on imbalanced datasets.

Select

News Hotspots Discovery Method Based on Multi Factor Feature Selection and AFOA/K-means

Tingxin Wen,Yangzi Li,Jingshuang Sun

Data Analysis and Knowledge Discovery. 2019, 3(4): 97-106. https://doi.org/10.11925/infotech.2096-3467.2018.0757

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper aims to improve the efficiency and accuracy of the hot topic by studying the feature reduction method and clustering algorithm of the news text. [Methods] Based on the traditional TF-IDF formula, the four features are introduced to realize multi factor feature selection, including weighting of symbol, part of speech, position and length. The Ameliorated Fruit fly Optimization Algorithm(AFOA) is constructed from four aspects of coding, fitness function, adaptive step length and population fitness variance. AFOA is used to optimize the K-means initial cluster center, and the optimized K-means is used to find hot topics. Multi factor feature selection is used to identify hot topics, and hot topic ranking is achieved by using TOPSIS. [Results] Relevant experiments show that multi factor feature selection and AFOA/K-means algorithm significantly improve the clustering effect respectively, and verify the overall effectiveness of the proposed method. [Limitations] It is only applicable to Chinese news texts. [Conclusions] The proposed method can provide a new idea for the research of Chinese news hotspots discovery.

Select

Analysis of Knowledge Flow Based on Academic Social Networks:
A Case Study of ScienceNet.cn

Xiaolan Wu,Chengzhi Zhang

Data Analysis and Knowledge Discovery. 2019, 3(4): 107-116. https://doi.org/10.11925/infotech.2096-3467.2018.1100

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This study aims to explore the knowledge flow on academic social networks. [Methods] Take ScienceNet.cn as the representative, we first collect all the data about users’ research directions and friends. Then, we use the simple correlation coefficient to measure the distribution relation of knowledge flow of different disciplines users, and adopt Louvain algorithm to detect the community structure among first-level disciplines. [Results] It is found that the knowledge flow of different disciplines is similar to each other through simple correlation coefficient. There are four knowledge-flow communities among first-level disciplines detected by Louvain algorithm. [Limitations] We construct knowledge flow network only based on friends’ relationship, without considering comments and recommendation relationship. [Conclusions] Through our research, we find that “Life Science” and “Medical Science” showed the most obvious disciplinary affinity in ScienceNet.cn. In addition, there are four main knowledge flow paths cross discipline departments, such as “Earth Science - Life Science - Medical Science”, “Chemical Science - Engineering Material - Mathematical Science-Information Science”, “Earth Science - Engineering Materials”, “Information Science - Management Science”.

Select

A Conditional Walk Quadripartite Graph Based Personalized Recommendation Algorithm

Yiwen Zhang,Chenkun Zhang,Anju Yang,Chengrui Ji,Lihua Yue

Data Analysis and Knowledge Discovery. 2019, 3(4): 117-125. https://doi.org/10.11925/infotech.2096-3467.2018.0662

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] By mining the relation characteristics between users and items, or between users and categories, this Paper extracts user preferences to optimize recommendation effect. [Methods] This paper extracts user rating and items degree attribute, mines user preferences, and puts forward the walk condition of User-Item bipartite graph; The category-User-Project-Category quadripartite graph is established by mapping User-Item-Category tripartite graph to the User-Category bipartite graph. The personalized recommendation method for user preferences through items and categories is proposed. [Results] Choosing MovieLens ratings data set as the source data, respectively comparing the experimental difference based on bipartite graph, weighted bipartite graph, tripartite graph and quadripartite graph, the results show that the Precision rate, MAE, recall rate, and coverage have been respectively optimized with this proposed method. [Limitations] Due to Movielens lack of critical textual data of users for movies, it is hard to analyze user preferences through the semantic. [Conclusions] This research analyzed user preferences through user ratings and degree attribute, it can be determined that the recommendation effect of quadripartite graph based on conditional walk is great.

Please choose a citation manager

Content to export

25 April 2019, Volume 3 Issue 4

模态框（Modal）标题

Please choose a citation manager

Content to export

25 April 2019, Volume 3 Issue 4